1.0  !f  jj;»  IM 

[  B  111? 


1.1 


1.8 


MICROCOPY  RESOLUTION  TEST  CHART 
NATIONAL  BlJHI  >*'U  Ol  MANDARIN  A 


QLV-QV 


Final  Report 

Models  for  Multidimensional  Tests 
and  Hierarchically  Structured  Training  Materials 

Mark  D.  Reckase 


Research  Report  ONR85-1 
May  1985 


The  American  College  Testing  Program 
Assessment  Programs  Area 
Test  Development  Division 
Iowa  City,  Iowa  52243 


V-A-» 


Prepared  under  Contract  No.  N00014-81 -K0817 
with  the  Personnel  and  Training  Research  Programs 
Psychological  Sciences  Division 
Office  of  Naval  Research 


Approved  for  public  release:  distribution  unlimited. 
Reproduction  in  whole  or  in  part  is  permitted  for 
any  purpose  of  the  United  States  Government. 


REPORT  DOCUMENTATION  PAGE 


Id  REPORT  SECURITY  CLASSIFICATION 

UNCLASSIFIED 


2a  SECURITY  CLASSIFICATION  AUTHORITY 


2b  DECLASSIFICATION  <  DOWNGRADING  SCHEDULE 


4  PERFORMING  ORGANIZATION  REPORT  NUMBER(S) 


lb  RESTRICTIVE  MARKINGS 


3  Distribution  /  availability  of  report  Approved  ior 
public  release:  distribution  unlimited. 
Reproduction  in  whole  or  in  part  is  permitted 
for  anv  purpose  of 


5  MONITORING  ORGANIZATION  REPORT  NUMBER(S) 


ONR  85-1 


6a  NAME  OF  PERFORMING  ORGANIZATION 


6c  ADDRESS  (City,  State,  and  ZIP  Code) 

P.O.  Box  168 

Iowa  City,  IA  52243 


8a  NAME  OF  FUNDING  /  SPONSORING 
ORGANIZATION 


8c  ADDRESS  (City,  State,  and  ZIP  Code) 


11  title  (Include  Security  Classification) 


6b  OFFICE  SYMBOL  7a  NAME  OF  MONITORING  ORGANIZATION 

(if  applicable)  PERSONNEL  &  TRAINING  RESEARCH  PROGRAMS 

OFFICE  OF  NAVAL  RESEARCH 


7b  ADDRESS  (City,  State,  and  ZIP  Code) 


Arlington,  VA  22217 


8b  OFFICE  SYMBOL  9  PROCUREMENT  INSTRUMENT  IDENTIFICATION  NUMBER 
(If  applicable) 

N00014-8 1-K081 7 


10  SOURCE  OF  FUNDING  NUMBERS 


PROGRAM 
ELEMENT  NO 

61153N 


PROJECT 

NO 

RR042-04 


WORK  UNIT 
ACCESSION  NO 

NR150-474 


Models  for  multidimensional  tests  and  hierarchically  structured  training  materials. 


12  PERSONAL  AUTHOR(S) 


13b  TIME  COVERED  1 4  DATE  OF  REPORT  (Year,  Month,  Day)  15  PAGE  COUNT 

from  81SEP01  T085FEB28  1985,  May  20 


COSATI  CODES 


18.  SUBJECT  TERMS  ( Continue  on  reverse  if  necessary  and  identify  by  block  number) 


SUB-GROUP 


Item  response  theory 
Latent  trait  theory 


Learning  hierarchies 
Multidimensional  models 


19  ABSTRACT  ( Continue  on  reverse  if  necessary  and  identify  by  block  number) 

Work  on  item  response  theory  was  extended  to  include  two  areas  that  had  not  been 
extensively  researched  previously.  They  include  models  for  test  items  that  require 
more  than  one  ability  for  a  correct  response  and  models  for  the  interaction  between 
modules  of  instruction  that  have  a  hierarchical  relationship.  For  both  of  these  types 
of  models,  estimation  procedures  were  developed  for  model  parameters  and  extensive 
work  was  done  to  determine  the  appropriate  interpretation  of  the  parameter  values. 

This  report  is  a  summary  of  work  performed  on  these  models  over  a  three  year  period. 


20  DISTRlBUTION/AVAILABlLiTY  OF  ABSTRACT  21  ABSTRACT  SECURITY  CLASSIFICATION 

CSI UNCLASSIFIED/UNLIMI  TED  El  SAME  AS  RPT  □  DTIC  USERS 


22a  NAME  OF  RESPONSIBLE  INDIVIDUAL  22b  TELEPHONE  (Include  Area  Code)  22 C  OFFICE  SYMBOL 

Dr.  Charles  Davis  (202)  696-4046 


OD  FORM  1473,  a  l  MA  !  83  APR  edition  may  be  used  until  exhausted  SECURITY  CLASSIFICATION  OF  THIS  PAGE 

All  other  editions  are  obsolete  •  , 

Unclassir led 


Contents 


Introduction . . . 

Development  and  Evaluation  of  MIRT  Models.... . 

Analysis  of  the  General  Rasch  Model . 

Interpretation  of  the  Model  Parameters......... 

Summary  and  Conclusions . . . . . 

Models  for  Performance  on  Hierarchically  Structured 

Training  Materials . 

The  Module  Characteristic  Curve  Model . . 

Summary  and  Conclusions... . . 

References . . . . . 


Final  Report 


Models  for  Multidimensional  Tests 
and  Hierarchically  Structural  Training  Materials 

Since  the  1950's,  there  has  been  increasing  interest  in  psychological  and 
educational  measurement  that  is  based  upon  probalistic  models  of  the 
interaction  between  a  person  and  a  test  item.  These  model-based  procedures 
demonstrate  how  strong  assumptions  can  be  used  to  gain  increased  control  over 
the  measurement  process.  For  example,  using  item  response  theory  (IRT),  the 
precision  of  measurement  at  every  point  along  an  ability  scale  can  be 
determined.  Also,  Items  can  be  selected  from  a  pool  to  form  a  test  with  any 
desired  level  of  precision  at  any  point  on  the  score  scale. 

The  strong  assumptions  needed  for  these  model-based  procedures  are 
basically  that  the  probabilistic  model  that  has  been  selected  accurately 
reflects  the  test  data,  and  that  local  independence  holds  for  the  model.  This 
latter  assumption  means  that  the  response  to  one  item  does  not  affect  the 
response  to  another  item,  and  that  the  response  by  one  person  does  not  affect 
the  response  by  another  person. 

Most  of  the  current  models  assume  that  the  measuring  instrument  measures 
only  a  singLe  trait  (Rasch,  i960;  Lord,  1952;  Birnbaura,  1968).  For  many 
tests,  this  assumption  is  at  least  approximated,  and  for  other  tests,  it  is 
unlikely  to  be  met  at  all.  Most  of  the  current  models  also  are  limited  to 
describing  a  person's  response  to  a  single  item.  In  some  cases  this 
limitation  may  make  it  difficult  to  solve  some  measurement  problems. 

The  purpose  of  the  research  done  on  this  contract  was  to  extend  the  types 
of  models  available  for  model-based  measurement.  Two  types  of  extensions  were 
considered.  The  first  was  an  extension  of  item  response  theory  models  to  the 
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case  where  the  measurement  device  was  not  assumed  to  be  measuring  a  single 
dimension.  These  models  were  labelled  multidimensional  item  response  theory 
(MIRT)  models. 

The  second  type  of  extension  was  to  cases  where  sets  of  related  items 
were  considered  as  a  unit.  These  related  sets  of  items  were  assumed  to  be 
measuring  educational  constructs  that  could  be  arranged  into  a  hierarchy  that 
facilitated  learning.  These  models  could  be  used  to  determine  the 
interrelationship  between  the  constructs  in  the  hierarchy  and  the  level  that 
must  be  reached  on  each  construct  before  a  person  should  be  moved  on  to  the 
next  higher  level  of  the  hierarchy.  Models  for  tests  used  with  hierarchically 
arranged  instructional  units  were  labelled  models  for  hierarchically 
structured  tests  (HST). 

The  approach  taken  to  develop  and  evaluate  the  MIRT  and  HST  models  was  to 
first  logically  evaluate  the  characteristics  of  potential  models,  then  to 
develop  estimation  procedures  for  the  parameter  of  the  models,  and  finally  to 
evaluate  the  models  on  their  ability  to  describe  real  test  data.  These  steps 
were  performed  separately  for  a  wide  class  of  models  of  each  type.  The 
results  of  the  research  will  now  be  described  for  each  type  of  model,  with  the 
analysis  of  the  MIRT  models  being  presented  first.  Only  a  summary  of  the 
outcome  of  the  research  will  be  presented  here,  but  references  will  be  made  to 
papers  and  technical  reports  that  contain  the  details  of  the  research  efforts. 

The  Development  and  Evaluation  of  MIRT  Models 

The  class  of  possible  multidimensional,  probabilistic  models  of  the 


interaction  between  a  person  and  a  test  item  is  essentially  infinite  in 
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size.  Any  expression  that  maps  a  vector  of  abilities  into  a  probability  could 
be  considered  as  a  MIRT  model. 

Therefore,  the  first  step  in  the  research  effort  was  to  limit  the 
possible  models  to  a  manageable  subset.  This  was  done  by  reviewing  the 
literature  to  determine  what  MIRT  models  had  been  proposed.  The  review 
identified  three  general  classes  of  models  that  had  been  suggested  for  use 
with  multidimensional  data. 

The  first  of  the  classes  of  models  considered  were  extensions  of  the 
general  model  proposed  by  Rasch  (1961).  This  model,  in  its  most  general  form, 
is  given  by 


p^ijivv  *7(0-Tv  e  !*(V6J +  ♦<V°i  +  9j  x(Vi  +  C(V'  O) 

where  p(x^l0j»  °i)  the  probability  of  response  x^  given  the  values  of 
vector  parameters  0^  and  ;  0^  is  a  vector  of  parameters  that  describes  the 
characteristics  of  person  j ;  is  a  vector  of  parameters  that  describes  item 
i;  Y  (0.,0j)  is  a  normalizing  function  defined  by 


vv  -1 


e[*(x1j)’0j  +  +(xij)*°i  +  0j  X(xi j )°i  +  p ^  ^ 


that  ensures  that  the  sum  of  the  probabilities  of  the  responses  to  this  item 
Is  equal  to  1.0;  $(x^)  is  a  vector  of  scoring  weights  that  indicates  the 
value  to  he  given  to  each  response  to  the  items  when  considering  the 
estimation  of  the  ability  parameters;  +  (  x )  is  a  vector  of  scoring  weights 
that  indicates  the  value  to  be  given  to  each  response  to  the  item  when 


considering  the  estimation  of  item  parameters;  x(x^j )  is  a  matrix  of  scoring 
weights  that  indicates  the  value  to  be  given  to  different  products  of  the 


elements  of  6j  and  o^;  and  p(x^)  *-s  a  constant  that  Is  used  to  set  the  origin 
of  the  linear  function  defined  by  the  exponent.  This  equation  defines  a  very 
general  class  of  models  that  specifies  the  dimensionality  of  the  complete 
latent  space  by  a  linear  function  in  the  exponent  of  the  logistic  model 
form.  Note  that  this  model  allows  one  ability  to  compensate  for  another  in 


the  metric  of  0  ^  .  That  is,  a  high  value  of  8^  can  compensate  for  a  low  value 


of  0.  in  the  linear  function  of  0.  defined  by 

jn  j 


+  ^,(x.  )0  +  •••  +  \|>  (x  )0. 

1  ij  jl  2  ij  j2  m  ij  jm 


(3) 


The  same  type  of  linear  compensation  is  present  for  the  item  parameters. 

The  second  class  of  models  considered  was  proposed  by  Mulaik  (1972). 
This  class  of  models  is  of  the  form 


m 


1  .<V  +  'V-ij 

k-*l _ 

i  *  ”  .<  V 

k=l 


(4) 


where  x  =  0,1;  m  is  the  number  of  dimensions;  and  all  of  the  other  terms 
have  been  defined  previously.  This  model  specifies  the  dimensionality  of  the 
complete  latent  space  as  a  sum  of  exponential  terms.  Ability  and  item 
parameters  can  also  compensate  for  each  other  in  this  model,  but  the 
compensation  occurs  on  an  exponential  scale.  An  interesting  point  to  note  is 


that  if  each  exponent  is  zero  in  this  model,  the  probability  of  a  correct 
response  is  m/(m  +  1).  Thus,  as  the  number  of  dimensions,  m,  increases,  the 


probability  of  a  correct  response  increases  unless  all  of  the  person  and  item 
parameters  are  rescaled.  For  the  model  presented  in  Equation  1,  the 


probability  is  always  .5  when  the  exponent  is  zero. 

The  third  class  of  models  that  was  considered  was  proposed  by  Sympson 
(1978)  and  in  a  slightly  different  form  by  Whitely  (1980).  This  class  of 
models  is  of  the  general  form  given  by 


p(xi.fi*V 


V  bi*  ci) 


ci +  (1-i)  kii 


ik 


<V '  bik) 


aik<ejk-  bik> 


(5) 


1  +  e 


where  is  a  vector  of  discrimination  parameters,  b^  is  a  vector  of 
difficulty  parameters,  is  the  lower  asymptote  of  the  probability  function, 
and  all  of  the  other  terms  have  been  defined  previously.  This  class  of  models 
determines  the  probability  of  a  response  based  on  abilities  in  a 
multidimensional  space  as  the  product  of  a  series  of  probability  like  terms. 
These  terms  are,  in  effect,  the  probability  of  the  response  to  the  item  if  the 
item  only  required  the  one  dimension.  The  overall  probability  is  the  product 
of  the  probabilities  on  each  dimension.  If  the  exponent  is  zero  on  each 
dimension,  the  probability  will  be  c^  +  (1  -  c^)  (.5)T  Thus,  the  probability 
of  a  correct  response  will  be  reduced  as  each  additional  dimension  is 
included,  unless  the  parameters  are  rescaled  for  each  level  of  dimensionality. 

Since  the  models  given  in  Equations  k  and  5  both  require  a  rescaling  of 
the  ability  scales  with  each  change  in  dimensionality,  and  because  both  of 
these  models  present  some  very  difficult  problems  in  parameter  estimation, 
they  were  removed  from  initial  consideration  and  the  model  presented  in 
Equation  1  became  the  focus  of  research  effort. 
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Analysis  of  the  General  Rasch  Model 

The  model  presented  in  Equation  1  defines  a  very  rich  class  of  special 
cases.  By  selectively  setting  the  weight  functions  to  zero,  many  different 
possible  models  can  be  derived,  each  of  which  have  different  properties.  Each 
of  these  special  cases  was  studied  both  through  a  mathematical  analysis  of  the 
equation  for  each  model  and  through  a  statistical  analysis  of  simulated  data 
generated  using  each  model.  The  resuLts  of  these  analyses  were  reported  in  a 
technical  report  and  in  a  series  of  papers  presented  at  professional 
meetings.  The  full  references  to  the  report  and  the  papers  are  given  below. 

McKinley,  R.  L.  and  Reckase,  M.  D.  (1982).  The  use  of  the  general  Rasch  model 
with  multidimensional  item  response  data  (Research  Report  ONR  82-1).  Iowa 
Citv,  IA:  The  American  College  Testing  Program. 

McKinley,  R.  L.  and  Reckase,  M.  D.  (1982,  March).  Multidimensional  latent 
trait  models.  Paper  presented  at  the  meeting  of  the  National  Council  on 
Measurement  in  Education,  New  York. 

McKinley,  R.  L.  and  Reckase,  M.  D.  (1982,  May).  An  analysis  of  the 

characteristics  of  a  family  of  IRT  models.  Paper  presented  at  the  meeting 
of  the  Psychometric  Society,  Montreal. 


The  results  of  these 
general  Rasch  were  capable 
.  The  first 


analyses  showed  that  two  special  cases  of  the 
of  modeling  realistic  multidimensional  item 
case  uses  only  the  and  ♦(x^.),Oj  terms 


response  data 
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of  Che  general  model.  The  weights  for  the  other  terms  were  set  to  zero.  The 
model  for  this  case  is  given  by 


m  m 

P(x  |e  a  )  = — i _ e(k=i  °ik6jk+k=i  0l>m+k) 

T(0j  »°i)  k  1 


where  the  symbols  have  been  defined  earlier.  This  form  of  the  model  can  be 
written  in  the  more  familiar  form  given  by 


,7-i  a‘k  eJk  +  di> 


P(X..|0.,a  d  )  =— - - - - 

lj  J  i*  1  ,  ,  ,  r  a  ,0 .,+  d.) 

1  +  e  (  2.  ik  jk  l 


"here  aik  *  “ik' 


m 

in  m  (  \  a  \ 

d,  =  a  b  =  \  o  1  +  e  .  ,  aik  jk+  i '  =  y(0. ,a. ] 

i  “  .  ik  ik  “  .  1,  m  +  k,  k=l  J  1  i  ’  i 

k=l  k=l  ’ 


and  a  ^  and  can  be  interpreted  as  the  a-  and  b-parameters  from 

unidimensional  IRT  models.  Equation  7  can  also  be  thought  of  as  a 
multidimensional  extension  of  the  two-parameter  logistic  model;  therefore,  it 
has  been  labelled  the  M2PL  model. 

The  second  special  case  of  the  general  Rasch  model  that  was  found  to 
model  multidimensional  item  response  data  uses  only  the  f(x^)'0j 
and  ♦  (x^)'o^  terms  from  the  general  model.  This  model,  is  of  the  form 


l«  -  X  1  (*(x. ,)’0.  +  *(x. .)’  0.) 

^ij'W  '7(6.707  e  j  4J  1 


1  V  A* T 1 J '  ■  *  u  r 1  *  j  J  ■;  v 1  >■  1  ■  r  i r  r '  ir".1*' "’ 
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where  all  of  the  terms  have  been  defined  previously.  This  model  has  been 
labelled  the  "cluster  model"  because  in  order  for  it  to  model  multidimensional 
data,  x^.  must  be  the  response  string  for  a  cluster  of  items  rather  than  the 
response  to  a  single  item.  If  the  item  cluster  contains  two  dichotomously 
scored  items,  the  possible  responses  would  be  0,0;  0,1;  1,0;  and  1,1.  For 
each  of  these  responses,  a  different  weight  function  would  be  available  for 
the  S-  and  o-vectors. 

Although  the  cluster  model  was  very  promising,  it  had  one  difficulty  that 
made  it  less  attractive.  In  order  to  use  the  model,  items  had  to  be 
clustered,  and  no  rigorous  means  for  doing  the  clustering  has  been 
developed.  Therefore,  research  efforts  concentrated  on  the  M2PL  model. 

Estimation  of  Model  Parameters 

In  order  for  a  model  to  be  useful,  it  must  be  possible  to  estimate  the 
parameters  of  the  model.  Once  the  M2PL  model  was  selected  as  the  model  for 
further  research  efforts,  work  was  begun  on  developing  procedures  for 
estimating  the  model  parameters.  Two  different  approaches  were  taken  to  solve 
the  estimation  problem:  (a)  unconditional  maximum  likelihood,  and  (b) 
conditional  maximum  likelihood.  Once  computer  programs  were  developed  for 
these  two  approaches,  they  were  validated  using  both  simulated  test  data 
generated  from  the  M2Ph  model,  and  real  test  data  that  were  selected  because 
of  their  multivariate  properties.  The  estimation  procedures  and  the  results 
of  the  program  validation  studies  were  presented  in  the  publications  and 
papers  listed  below. 


McKinley,  R.  L.  and  Reckase,  M.  D.  (1983).  MAXLOG:  a  computer  program  for 
the  estimation  of  the  parameters  of  a  multidimensional  logistic  model. 
Behavior  Research  Methods  and  Instrumentation,  15(3)  ,  389-390. 

McKinlev,  R.  L.  and  Reckase,  M.  D.  (1983).  An  application  of  a 

multidimensional  extension  of  the  two-parameter  logistic  latent  trait  model 
(Research  Report  ONR83-3).  Iowa  City,  IA:  The  American  College  Testing 
Program. 

■•'eckaso,  M.  I).  and  McKinley,  R.  L.  (  1982,  July).  Some  latent  trait  theory  in 
i  multidimensional  latent  space.  Paper  presented  at  the  Invitational 
C  ntoreneo  on  IRT/CAT,  Wayzata,  MN. 

mo,  M.  !).  and  McKinley,  R.  L.  (  1982  ,  August).  The  feasibility  of  a 
I i i d i mens  tonal  latent  trait  model.  Paper  presented  at  the  meeting  of  the 
Am*->  r  i  r.in  Psvcho  logical  Association,  Washington,  D.C. 

Mc  y  i  n  i  .•  v ,  R.  L.  ('1983  ,  April).  A  multidimensional  extension  of  the  two- 
par  v"('  ter  logistic  latent  trait  model.  Paper  presented  at  the  meting  of 
the  National  Council  on  Measurement  in  Education,  Montreal. 

McKinley,  R.  md  Reckase,  M.  I).  (1983  ,  April).  The  use  of  IRT  analysis  on 

d i choLomou  .,ata  from  multidimensional  tests.  Paper  presented  at  the 
meeting  of  he  American  Educational  Research  Association,  Montreal. 
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real  test  data  that  should  be  hierarchically  related.  However,  the  upper  and 
lower  asymptotes  did  not  appear  to  be  needed  for  the  particular  real  data  set 
that  was  analyzed.  Further  studies  need  to  be  done  to  determine  whether  this 
is  a  general  finding  applicable  to  all  hierarchically  arranged  modules,  or 
whether  it  only  applies  to  this  case.  If  the  c-  and  e-parameters  are  not 
needed,  the  model  can  be  simplified  to  a  two-parameter  logistic  model. 

One  problem  with  the  use  of  the  model  became  evident  with  the  analysis  of 
the  reaL  test  data.  In  order  to  accurately  estimate  the  parameters  of  the 
model,  examinees  must  be  routed  to  the  higher  level  unit  of  instruction  even 
when  they  have  not  performed  well  on  the  lower  level  unit.  This  is  poor 
educational  practice  and,  in  many  cases,  this  data  collection  procedure  cannot 
be  followed.  This  makes  it  difficult  to  obtain  data  for  use  in  estimating  the 
parameters  of  the  model.  It  may  be  that  the  model  will  have  to  be  modified  to 
accomodate  the  routing  procedures  that  are  currently  being  used  in  modularized 


instructional  programs 
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k  scale  specified  by  the  b-parameter  is  the  suggested  decision  point  on  module 
k  for  routing  to  module  j  if  misclassif ication  errors  in  either  direction  are 
considered  equally  serious. 

In  order  to  evaluate  this  model,  it  was  applied  to  both  simulated  and 
real  test  data  to  determine  whether  the  estimation  procedures  worked  properly, 
and  whether  it  realistically  represented  actual  test  results.  The  outcome  of 
these  studies  were  presented  in  the  following  documents. 

McKinley,  R.  L.  and  Reckase,  M.  D.  (1984).  A  latent  trait  model  for 

sequentially  arranged  units  of  instruction.  Iowa  City,  IA:  The  American 
College  Testing  Program. 

McKinley,  R.  L.  and  Reckase,  M.  D.  (1984,  April).  A  latent  trait  model  for 
use  with  sequentially  arranged  units  of  instruction.  Paper  presented  at 
the  meeting  of  the  American  Educational  Research  Association,  New  Orleans. 

The  studies  showed  that  the  parameters  of  the  model  could  be  accurately 
estimated  and  that  for  one  set  of  real  test  data,  the  model  gave  very 
reasonable  results.  There  was  some  indications,  however,  that  the  upper  and 
lower  asymptote  parameters  might  not  be  needed.  It  may  be  possible  to 
simplify  the  model  to  a  two-parameter  logistic  form. 

Summary  and  Conclusions 

A  model  for  the  relationship  between  modules  of  instruction  that  are 
hierarchically  related  was  proposed  and  evaluated  using  both  simulated  and 
real  test  data.  The  results  of  the  studies  showed  that  the  model  parameters 


could  be  accurately  estimated  and  that  the  model  was  a  good  representation  of 


where  !’.("  )  is  t h e  probability  of  passing  module  j  given  level  of 
>>-r:  f'Tiaii,  ,■  ■  of  examinee  i  on  prerequisite  module  k,  c.  is  the  probability 

1  K  J 

*  losing  module  j  if  the  examinee  has  not  acquired  any  knowledge  in  module 
<■  ,  is  it'e  probability  of  passing  module  j  if  the  examinee  has  mastered 

"'•dole  -  ,  i>  =  1.7,  a  j  is  a  parameter  related  to  the  strength  of  the 

:  <• !  .u  i oushi p  between  the  two  modules,  and  bj  is  the  difficulty  of  the  passing 

-i or.'  used  on  module  j.  This  model  predicts  the  probability  that  an  examinee 

w i ! !  pass  module  j  based  on  his/her  performance  on  module  k. 

In  order  to  use  this  model,  estimates  of  achievement  are  first  obtained 
on  module  k.  This  can  either  he  done  by  analyzing  the  module  k  test  using  an 
I KT  model,  or  by  converting  the  raw  scores  on  module  k  to  z-scores.  These 
achievement  measures  are  then  used  as  known  values  and  the  model  parameters 
are  estimated  using  a  maximum  likelihood  estimation  procedure. 

A  very  low  a-parameter  estimate  is  an  indication  that  the  two  modules  are 
not  verv  highly  related.  A  high  a-value  indicates  that  knowledge  on  module  k 
is  verv  important  for  module  j .  A  high  estimate  for  the  c-parameter  indicates 
that  examinees  can  perform  well  on  module  j  even  without  mastering  module  k. 

A  low  c-value  indicates  that  an  examinee  cannot  perform  well  on  module  j 
unless  knowledge  has  been  acquired  on  module  k. 

Estimates  of  the  e-parameter  indicate  the  maximum  probability  of  passing 
the  )  module  given  that  the  examinee  has  mastered  module  k.  Low  values 
indicate  that  module  k  contains  only  a  small  portion  of  the  Information  needed 
to  pass  module  j.  High  values  indicate  that  module  k  includes  most  of  the 
information  needed  to  pass  module  7. 

The  b-parameter  estimates  indicate  the  point  on  the  module  k  scale  that 
best  distinguishes  between  persons  who  pass  or  fail  module  7.  This  point  will 
change  with  changes  In  tin1  passing  score  on  module  j.  The  point  on  the  module 
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coefficients  of  dependence  were  found  to  provide  insufficient  i-i*  ■ 

validating  the  sequence  of  instructional  units,  or  for  sn  In,'  •  « 

scores.  The  procedures  based  on  mathematical  models  wer.-  . .  ' 

potential,  but  the  currently  available  procedures  did  n  >:  s,-,-  •. 

needs  of  instructional  programs.  There  seemed  to  be  a  dear  ie.-  ' 

procedure  that  could  be  used  to  arrange  units  of  instruction  to;  ,  .■ 

based  upon  the  prerequisite  knowledge  required  hy  each  unit  of  i  k:  -  . 

and  that  could  be  used  to  set  passing  scores  for  each  unit  that  won;  •  1 ■ > ; , r 

the  efficiency  and  accuracy  of  the  routing  process.  The  model  nroju-,,  !  i  d 

evaluated  during  this  research  effort  was  designed  to  perform  these  turn  lions 

The  Module  Characteristic  Curve  Model 

The  basic  Idea  behind  the  proposed  model  for  the  interrelationship 
between  modules  of  Instruction  is  that  if  two  modules  form  a  learning 
hierarchy,  performance  on  the  higher  level  instructional  module  is  dependent 
upon  prerequisite  knowledge  obtained  from  the  lower  level  module  of 
Instruction.  Thus,  if  sufficient  knowledge  has  not  been  gained  on  the  lower 
level  module,  a  high  level  of  performance  cannot  be  exhibited  on  the  higher 
level  module  of  instruction.  This  implies  that  success  on  the  higher  module 
is  related  to  the  level  of  performance  on  the  lower  module. 

The  probabilistic  model  that  was  hypothesized  to  describe  the 
relationship  between  hierarchically  related  instructional  modules  is  given  by 


V9ik> 


c .  +  ( 1-c . 
3  .1 


V 


e 

1  + 


Dai(eu< 
Da.  (0 
e  J 


ik 


b.) 


(9) 


scores  on  the  tests  are  used  to  route  the  students  through  the  units  of 


instruction.  The  purpose  of  this  component  of  the  project  was  to  evaluate  an 
IRT-type  model  that  had  potential  for  assisting  in  determining  the 
interrelationships  between  the  instructional  units  and  in  determining  the 
decision  points  that  should  be  used  with  each  unit  test  to  minimize  routing 
error.  The  model  treats  each  unit,  or  module,  of  instruction  as  a  complex 
item  and  hypothesizes  a  particular  mathematical  form  for  the  interrelationship 
between  performance  on  one  module  and  the  probability  of  successfully  passing 
the  next  module  in  the  instructional  program. 

The  first  step  in  the  evaluation  of  this  model  for  performance  in 
instructional  programs  was  to  review  the  literature  in  the  area  called 
"learning  hierarchies"  to  determine  what  procedures  were  currently  being  used 
to  evaluate  the  interrelationships  between  units  of  instruction  and  to  set 
passing  scores  on  the  unit  tests.  The  information  obtained  from  the  review 
would  serve  as  a  basis  for  comparison  for  the  results  obtained  from  the 
proposed  model.  The  review  of  the  literature  was  presented  in  the  following 
report . 

Reckase,  M.  D.  and  McKinley,  R.  L.  (1982).  The  validation  of  learning 
hierarchies  (Research  Report  ONR  82-2).  Iowa  City,  IA:  The  American 
College  Testing  Program. 

The  review  of  the  literature  indicated  that  there  were  two  general  types 
of  procedures  that  had  been  used  to  Indicate  the  relationships  between 
instructional  units;  those  based  on  coefficients  of  dependence,  and  those 
based  on  1  more  ■•omplete  description  of  the  relationships  between  units  of 
i'vcrui  '  ",  usual  ]v  a  mathematical  model.  The  procedures  based  on 


multidimensional  extension  of  the  two-parameter  logistic  model  was  selected  as 
a  promising  model  for  future  work.  Estimation  procedures  were  developed  for 
this  model  and  the  results  were  validated  using  simulated  and  real  test 
data.  A  theoretical  foundation  was  layed  for  an  interpretation  of  the  item 
parameters  of  the  MIRT  models,  and  definitions  of  multidimensional  item 
difficulty,  discrimination,  and  information  were  developed.  At  this  point,  a 
sufficient  framework  has  been  developed  to  make  multidimensional  item  response 
theory  a  viable  technique. 

Although  substantial  advances  have  been  made  in  the  area  of  MIRT,  even 
more  work  is  left  to  be  done.  The  current  estimation  programs  require 
excessive  amounts  of  computer  time  when  more  than  two  or  three  dimensions  are 
specified  for  a  model.  Work  needs  to  be  done  to  make  estimation  of  the 
parameter  more  efficient.  Procedures  are  needed  to  determine  the  appropriate 
number  of  dimensions  for  a  set  of  test  data,  and  procedures  for  indicating  the 
fit  of  the  models  to  the  data  are  needed.  A  related  question  is  whether  the 
M2PL  model  is  an  accurate  representation  of  the  interaction  between  a  person 
and  an  item.  This  model  implies  that  one  ability  can  compensate  for 
another.  Perhaps  a  model  of  this  type  is  not  appropriate.  These  and  other 
questions  will  be  addressed  in  future  work. 


Models  for  Performance  on 
Hierarchically  Structured  Training  Materials 


Programs  of  instruction  are  often  composed  of  many  short,  homogenous 
ins:  rurfional  units  ihuf  have  been  arranged  according  to  the  logical 

■  ■ -r  • !  a t  i  ons ■ :  i  ;>s  >'  h<*  content  •  T n  many  cases,  short  tests  are  given  to 


■  I  >f  competence  on  a  unit  of  instruction,  and  the 


The  second  point  that  became  evident  was  that  the  locus  of  points  of 
Inflection  could  change  with  the  direction  taken  relative  to  the  surface  in 
the  multidimensional  space.  This  is  a  direct  consequence  of  the  fact  that  the 
slope  at  a  point  on  the  IRS  is  different  in  different  directions.  The 
direction  in  the  space  is  one  way  of  indicating  the  composite  of  abilities 
that  is  of  interest. 

In  order  to  take  these  two  points  into  account,  a  definition  of 
multidimensional  difficulty  was  derived  that  was  based  upon  a  vector 
conceptualization.  The  multidimensional  difficulty  of  an  item  was  defined  as 
the  direction  from  the  origin  of  the  multidimensional  space  to  the  point  of 
steepest  slope  and  the  distance  from  the  origin  to  the  point  of  steepest 
slope.  Discrimination  of  an  item  was  related  to  the  slope  in  the  difficulty 
direction  at  the  point  of  the  steepest  slope.  Information  was  also  given  a 
directional  Interpretation.  For  a  group  centered  at  the  origin  of  the  space, 
an  item  is  most  informative  in  the  difficulty  direction.  The  item  information 
can  also  be  determined  in  any  other  direction,  but  the  maximum  information 
will  be  less  than  in  the  direction  indicated  by  the  multidimensional 
difficulty. 

The  definitions  of  multidimensional  difficulty,  discrimination,  and 
information  are  general  enough  that  they  apply  to  any  MIRT  model  that  is 
monotonical ly  increasing  in  probability  with  an  increase  in  any  ability 
dimension.  The  definition  also  includes  the  unidimensional  definitions  as 
special  cases. 

Summary  and  Conclusions 

This  portion  of  the  research  project  accomplished  several  important  tasks 
In  the  development  of  MIRT.  A  number  of  models  were  analyzed  and  the 


Reckase,  M.  D.  and  McKinley,  R.  L.  (1983,  April).  The  definition  of 


difficulty  and  discrimination  for  multidimensional  item  response  theory 
models .  Paper  presented  at  the  meeting  of  the  American  Educational 
Research  Association,  Montreal. 

Reckase,  M.  D.  and  McKinley,  R.  L.  (1983,  June).  The  item  difficulty  concept 
generalized  to  the  multidimensional  latent  space.  Paper  presented  at  the 
meeting  of  the  Psychometric  Society,  Los  Angeles. 

Reckase,  M.  D.  and  McKinley,  R.  L.  (1984,  June).  Multidimensional  difficulty 
as  a  direction  and  a  distance.  Paper  presented  at  the  meeting  of  the 
Psychometric  Society,  Santa  Barbara,  CA. 

Initial  work  in  this  area  concentrated  on  deriving  a  direct 
generalization  of  the  interpretations  of  the  difficulty  and  discrimination 
parameters  and  item  and  test  information  from  the  unidimensional  item  response 
theory  models  to  the  M1RT  models.  Since  the  difficulty  of  an  item  was  defined 
for  the  unidimensional  models  as  the  point  on  the  ability  scale  corresponding 
to  the  point  of  inflection  of  the  item  characteristic  curve,  multidimensional 
difficulty  was  conceptually  thought  of  as  the  point  of  inflection  of  the 
multidimensional  item  response  surface  (IRS).  An  analysis  of  this  approach 
quickLy  made  two  important  points  evident.  First,  for  an  IRT  there  is  not  a 
single  point  of  inflection,  but  rather  a  locus  of  points  of  inflection. 
Depending  upon  the  MTRT  model  and  the  dimensionality  being  considered,  this 
locus  of  points  of  inflection  could  be  a  straight  Line,  a  curve,  a  hyperplane, 
or  a  hypersurface.  The  complexity  of  the  locus  of  points  of  inflection  made 
its  practical  application  difficult. 
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The  study  showed  that  the  dimensionality  of  both  the  items  and  the 
examinee  population  was  important  in  interpreting  the  results  of  an  M2PL 
analysis.  Tf  each  item  were  a  relatively  pure  measure  of  an  ability,  the 
procedure  obtained  good  estimates  of  the  ability  parameters,  even  when  they 
were  correlated.  But,  as  the  correlation  between  ability  estimates  increased, 
there  was  some  deterioration  of  the  accuracy  of  the  estimates.  When  each  item 
measured  more  than  one  ability,  the  effect  of  correlated  abilities  was  more 
extreme.  As  the  correlation  between  abilities  increased,  the  M2PL  solution 
tended  to  collapse  to  a  single  dimension.  The  results  seemed  to  imply  the 
need  for  procedures  for  oblique  rotations  to  improve  the  recovery  of  the 
ability  dimensions. 

Interpretation  of  the  Model  Parameters 

When  a  MIRT  model  is  used,  estimates  can  be  obtained  for  the  ability  and 
the  item  parameters.  The  ability  parameter  estimates  can  be  interpreted  in  a 
fairly  straightforward  manner  as  the  amount  of  ability  a  person  has  on  each 
dimension.  The  item  parameter  estimates,  however,  do  not  have  the  same 
intuitive  meaning.  Therefore,  a  major  part  of  this  project  dealt  with 
determining  the  MIRT  model  analogs  to  the  unidimensional  IRT  item  parameters 
and  the  measures  of  quality,  such  as  item  and  test  information.  The  results 
of  the  work  in  this  area  were  presented  in  the  following  documents. 

McKinley,  R.  L.  and  Reckase,  M.  D.  (1983).  An  extension  of  the  two-parameter 
logistic  modeL  to  the  multidimensional  latent  space  (Research  Report  ONR83- 
2).  Iowa  City,  IA:  The  American  College  Testing  Program. 


The  results  of  these  studies  showed  that  both  the  unconditional  and 
conditional  maximum  likelihood  procedures  could  be  used  to  estimate  the  item 
and  ability  parameters  of  the  M2PL  model,  but  that  the  unconditional  maximum 
likelihood  procedure  required  somewhat  less  computer  time.  However,  both 
procedures  require  fairly  extensive  computer  facilities,  and  as  the  number  of 
dimensions  in  the  model  increased,  the  computer  time  required  became 
prohibitive.  It  was  clear  that  improved  estimation  procedures  were  needed  if 
the  M2PL  model  was  to  be  widely  used. 

The  validation  of  the  estimation  procedures  yielded  uniformly  good 
results  when  simulated  test  results  were  used.  However,  when  real  test  data 
were  analyzed,  the  results  were  inconsistent.  Some  studies  gave  readily 
Interpretable  results  that  were  in  many  ways  similar  to  factor  analytic 
results.  In  other  studies  anomolies  appeared,  such  as  highly  negatively 
correlated  ability  estimates  that  suggested  that  added  constraints  were  needed 
to  control  the  estimation  process. 

In  order  to  study  the  estimation  process  in  more  detail,  the  M2PL 
procedure  was  used  to  analyze  simulated  test  data  that  had  been  produced  using 
a  multivariate  ability  distribution  that  had  varying  degrees  of  correlation 
between  the  abilities.  The  results  of  the  study  were  presented  in  the 
following  report. 

McKinley,  R.  L.  and  Reckase,  M.  D.  (1984).  An  investigation  of  the  effect  of 
correlated  abilities  on  observed  test  characteristics  (Research  Report  ONR 
84-1).  Iowa  City,  IA:  The  American  College  Testing  Program. 
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